Part 1: Introduction


Project Goal

Our project examines the impact of the COVID-19 pandemic on different aspects of the economy such as GDP and unemployment rate. We decided to focus our analysis on four states that had varying coronavirus restrictions.

#Economy Visualization

The global pandemic this past year has impacted many aspects of our normalcy and daily lives. We utilized the Twitter Developer API to better understand how people across the world were thinking about the current state of our economy, by gathering the 600 most recent tweets published with #economy and creating a word cloud to visualize our findings. The top words in the word cloud are “covid”, “business”, followed by “jobs”, “government”, and “globalhealth” to name a few.

 

Hypothesis

We predict a linear correlation between unemployment rates and number of cases. As the number of COVID-19 cases increased, local and state governments entered lockdowns and further quarantine restrictions which impacted business and services, leading to higher levels of unemployment. In contrast, we hypothesize that an increasing number of deaths may not have had an equivalently strong correlation with unemployment rates. There are a few reasons for this: (1) cases spread and are recorded more quickly, (2) the number of cases is drastically higher than number of deaths, (3) there seemed to be a long-term misconception that only the elderly and immunocompromised (who might represent a smaller subset of the general population) were at high risk for severe loss due to the virus.

In terms of GDP per State, we have chosen California, Texas, Rhode Island, and Utah to examine in detail. We predict that California might have experienced an increase in GDP, due to the number of technology companies residing in the State of California who reported record amounts of sales during the shift to a remote, virtual world. We briefly researched this topic with the California State Assembly. California and Texas each have large populations and were at the top of the charts at distinct points of time for cases/deaths, but followed drastically contrasting responses to the pandemic in terms of lockdowns, mask mandates, etc. We hypothesize that California would have seen an increase or at least continuation of GDP from previous years whereas Texas might have experienced a decline. Rhode Island and Utah are closely related in population amounts, however, Utah was the one of the only states to not issue a lockdown order. Specific counties in Utah with larger populations or tourist sites did issue lockdown orders, whereas Rhode Island’s governor issued a stay at home order in late March itself. We briefly researched stay at home orders through the New York Times. We predict that Rhode Island will experience a steady GDP in 2020 whereas Utah might experience a decline due to tourist locations and highly populated areas (with the highest job density) were the only ones experiencing lockdowns.

 

Variables
variable description
U.S. COVID-19 Cases Number of COVID-19 positive cases recorded each day in each state; this data has been wrangled into monthly national data from January 2020 to March 2021.
U.S. COVID-19 Deaths Number of deaths from COVID-19 recorded each day in each state; this data has been wrangled into monthly national data from January 2020 to March 2021.
U.S. Unemployment Rate per Month National recorded unemployment rates on a monthly basis from January 2020 to March 2021.
U.S. GDP Percent Change per Year National monthly change in unemployment rates on a monthly basis from January 2020 to March 2021.
Industry Arts, Health, Food, Eeal Estate, and Technical.
States California, Texas, Utah, and Rhode Island.

Code

Load libraries
# load twitter library, and all other libraries 
library(rtweet)
library(ggplot2)
library(dplyr)
library(tidytext)
Setting up the API
# name of created app
appname <- "economic-impact-of-covid19"

## api key 
key <- "LDVYl0gTvhvYbNnsiJd3IrTMk"

## api secret
secret <- "HnBymm96BOg1SYZKxvgURKc7ULJ8MKE0Fnj0ZjFfWqey7e34FQ"

# create token named "twitter_token"
twitter_token <- create_token(
  app = appname,
  consumer_key = key,
  consumer_secret = secret,
  access_token = '2620473211-WWfgOYF0UiZM0yLw3K0T3WxlG5a2cyH5heHimhC',
  access_secret = 'Z6LWeugQ0mi7ANivINfUvWndSkT0ic3gceF0YQn2MC4jY')
Gather up Tweets
data <- search_tweets("#economy", n = 600, include_rts = FALSE)
Display Tweets
head(data$text)
## [1] "With increase in inflation rate and consistent interest rate, real interest rate of #Pakistan remains negative.\n\n#economy #KSE100 #interestrates #inflation https://t.co/dvvfm9lLmY"                                                                                                                           
## [2] "#cybercrime, #Darkside gang now target #StockMarket organizations. @RecordedFuture #CyberSecurity experts: The #Ransomware group will notify info to crooked market traders in advance, so they can short a company’s #stockprice. #infosec #economy https://t.co/mgVypG2IwX"                                    
## [3] "“This graph shows the amount that the #German Central Bank has lent to the @ecb . This amount is owed mainly by #Spain &amp; #Italy. It clearly shows the risk of a breakout in the #Eurozone  At any moment there will be a group that demands to end this.” #economy https://t.co/zEdVFaSf5J"                  
## [4] "#AntiHomeless #architecture under #Capitalism vs under #Socialism (well, under #Communism is #housing totally for free, but yeah, in a #Socialist #economy still exist #money)\n\n#homelessness \n#socialissues https://t.co/Xn0nxU496D"                                                                         
## [5] "Wealth Morning<U+2600><U+FE0F>May 3rd,2021\nBreaking News from PAK &amp; Around the World\n\n<U+0001F53A>#Pakistan‘s #exports crossed $2 Billion for 7 months<U+0001F1F5><U+0001F1F0>\n<U+0001F53A>#UAE 2020 #economy shrank 6.1% amid #COVID19 <U+0001F1E6><U+0001F1EA>\n<U+0001F53A>#Asia #stocks to slow start; #WallStreet extended its #bullrun <U+0001F4C8>\n...\nhttps://t.co/hpBo0N3Kgh https://t.co/viLjJunNtQ"
## [6] "#April factory activity picks up, prices surge fastest in 7 years.\n#economy #India #Nifty #Investment #investing"
Cleaning the data
# Load libraries 
library(wordcloud)
library(RColorBrewer)
library(wordcloud2)
library(tm)

# Create a vector containing only the text
text <- data$text

# Create a corpus  
docs <- Corpus(VectorSource(text))

# Clean data 
docs <- docs %>%
  tm_map(removeNumbers) %>%
  tm_map(removePunctuation) %>%
  tm_map(stripWhitespace)
## Warning in tm_map.SimpleCorpus(., removeNumbers): transformation drops documents
## Warning in tm_map.SimpleCorpus(., removePunctuation): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(., stripWhitespace): transformation drops
## documents
docs <- tm_map(docs, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(docs, content_transformer(tolower)):
## transformation drops documents
docs <- tm_map(docs, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(docs, removeWords, stopwords("english")):
## transformation drops documents
Organizing the text data
dtm <- TermDocumentMatrix(docs) 
matrix <- as.matrix(dtm) 
words <- sort(rowSums(matrix),decreasing=TRUE) 
df <- data.frame(word = names(words),freq=words)
Generating the word cloud
# Generate the word cloud
set.seed(1234)  
wordcloud(words = df$word, freq = df$freq, min.freq = 1, max.words=200, random.order=FALSE, rot.per=0.35, colors=brewer.pal(8, "Dark2"))

 

Part 2: COVID-19 Data Analysis


Visualization

Data

Let’s take a look at the total counts of new COVID-19 cases and deaths recorded each month in the U.S. This data has been reported by the U.S. Center for Disease Control in their data set: “United States COVID-19 Cases and Deaths by State over TimeCase Surveillance”. We have cleaned and organized daily COVID-19 reported data by state into national data by month.

##    months      dates newcases deaths
## 1   JAN20 2020-01-01        7      0
## 2   FEB20 2020-02-01       32      0
## 3   MAR20 2020-03-01   188138   3680
## 4   APR20 2020-04-01   875947  55007
## 5   MAY20 2020-05-01   725178  42053
## 6   JUN20 2020-06-01   847821  21634
## 7   JUL20 2020-07-01  1924412  28249
## 8   AUG20 2020-08-01  1470219  29184
## 9  SEPT20 2020-09-01  1217378  22327
## 10  OCT20 2020-10-01  1928056  24012
## 11  NOV20 2020-11-01  4400344  39397
## 12  DEC20 2020-12-01  6395806  79189
## 13  JAN21 2020-01-01  6098794  97095
## 14  FEB21 2020-02-01  2354530  63431
## 15  MAR21 2020-03-01  1773722  33041

Now, let’s look at how counts of new cases per month arose and fell from Jan ’20 to Mar ’21:

New COVID-19 Cases per Month

This bar plot helps us visually time line the spread COVID-19. The red bars denote the top five months which experienced the highest number of recorded cases, often referred to as peaks or “second, third” waves. We can infer that July ’20, Oct ’20, Nov ’20, Dec ’20, Jan ’21, and Feb ’21 had the highest number of recorded cases; from this data, we might predict that unemployment rates would be higher during these months and GDP would be lower.

Now, let’s look at how counts of new deaths due to COVID-19 per month arose and fell from Jan ’20 to Mar ’21:

New COVID-19 Deaths per Month

This bar plot reports the months which recorded the highest number of deaths due to COVID-19; the bars in red signify the top five months with the highest amount of deaths. These months include: Apr ’20, May ’20, Dec ’20, Jan ’21, and Feb ’21. In correlation with the highest number of cases recorded, the months of Dec ’20 through Feb ’21 had both the highest number of cases and deaths. It is surprising that April and May of 2020 had two of the highest number of deaths although they didn’t have a peak in cases. However, Apr/May were early months of the pandemic where we had little knowledge about the virus, and their average case count is around 800K which might have propelled the peak in July of 1.9 million cases.

 

Code

Load libraries
library(magrittr) 
library(dplyr) 
library(ggplot2)
library(plotly)
library(lubridate)
library(grid)
library(gridExtra)
library("tidyverse")
Load datasets
ur.df <- read.csv("C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets/Unemployment-Monthly.csv")

covid19.health <- read.csv("C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets/United_States_COVID-19_Cases_and_Deaths_by_State_over_Time.csv")
Cleaning and organizing unemployment rate dataframe
# Dropping unused columns 
ur.cleaned.df <- subset(ur.df, select = -c(Series.ID, Year, Period))

# Renaming columns 
colnames(ur.cleaned.df) <- c("Time_Period", "Unemployment_Rate", "Monthly_Change")

#Result 
ur.cleaned.df
##    Time_Period Unemployment_Rate Monthly_Change
## 1     2020 Jan               3.5           -2.8
## 2     2020 Feb               3.5            0.0
## 3     2020 Mar               4.4           25.7
## 4     2020 Apr              14.8          236.4
## 5     2020 May              13.3          -10.1
## 6     2020 Jun              11.1          -16.5
## 7     2020 Jul              10.2           -8.1
## 8     2020 Aug               8.4          -17.6
## 9     2020 Sep               7.8           -7.1
## 10    2020 Oct               6.9          -11.5
## 11    2020 Nov               6.7           -2.9
## 12    2020 Dec               6.7            0.0
## 13    2021 Jan               6.3           -6.0
## 14    2021 Feb               6.2           -1.6
## 15    2021 Mar               6.0           -3.2
Cleaning and organizing COVID-19 Health Dataframe
# Dropping unused columns 
covh.cleaned <- subset(covid19.health, select = c(submission_date, state, tot_cases, new_case, tot_death, new_death))

# Renaming columns 
colnames(covh.cleaned) <- c("Date_Reported", "State", "Total_Cases", "New_Cases", "Total_Death", "New_Death")
Transforming the format of dates
# https://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html

covh.cleaned$Date_Reported <- mdy(covh.cleaned$Date_Reported) # returns year/month/day 

# Result 
head(covh.cleaned$Date_Reported) 
## [1] "2021-04-01" "2020-10-15" "2021-03-16" "2021-04-16" "2020-02-14"
## [6] "2020-08-08"
Creating dataframes for each month
# https://blog.exploratory.io/filter-with-date-function-ce8e84be680

# JANUARY 2020 
jan20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported <= as.Date("2020-01-31")))

jan20.newcases <- sum(jan20.covid.df$New_Cases)
jan20.deaths <- sum(jan20.covid.df$New_Death)

# FEBRUARY 2020 
feb20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-02-01") & Date_Reported <= as.Date("2020-02-28")))

feb20.newcases <- sum(feb20.covid.df$New_Cases)
feb20.deaths <- sum(feb20.covid.df$New_Death)

# MARCH 2020 
mar20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-03-01") & Date_Reported <= as.Date("2020-03-31")))

mar20.newcases <- sum(mar20.covid.df$New_Cases)
mar20.deaths <- sum(mar20.covid.df$New_Death)

# APRIL 2020 
apr20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-04-01") & Date_Reported <= as.Date("2020-04-30")))

apr20.newcases <- sum(apr20.covid.df$New_Cases)
apr20.deaths <- sum(apr20.covid.df$New_Death)

# MAY 2020 
may20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-05-01") & Date_Reported <= as.Date("2020-05-31")))

may20.newcases <- sum(may20.covid.df$New_Cases)
may20.deaths <- sum(may20.covid.df$New_Death)

# JUNE 2020 
jun20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-06-01") & Date_Reported <= as.Date("2020-06-30")))

jun20.newcases <- sum(jun20.covid.df$New_Cases)
jun20.deaths <- sum(jun20.covid.df$New_Death)

# JULY 2020 
jul20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-07-01") & Date_Reported <= as.Date("2020-07-31")))

jul20.newcases <- sum(jul20.covid.df$New_Cases)
jul20.deaths <- sum(jul20.covid.df$New_Death)

# AUGUST 2020 
aug20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-08-01") & Date_Reported <= as.Date("2020-08-31")))

aug20.newcases <- sum(aug20.covid.df$New_Cases)
aug20.deaths <- sum(aug20.covid.df$New_Death)

# SEPTEMBER 2020 
sep20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-09-01") & Date_Reported <= as.Date("2020-09-30")))

sep20.newcases <- sum(sep20.covid.df$New_Cases)
sep20.deaths <- sum(sep20.covid.df$New_Death)

# OCTOBER 2020 
oct20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-10-01") & Date_Reported <= as.Date("2020-10-31")))

oct20.newcases <- sum(oct20.covid.df$New_Cases)
oct20.deaths <- sum(oct20.covid.df$New_Death)

# NOVEMBER 2020 
nov20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-11-01") & Date_Reported <= as.Date("2020-11-30")))

nov20.newcases <- sum(nov20.covid.df$New_Cases)
nov20.deaths <- sum(nov20.covid.df$New_Death)

# DECEMBER 2020 
dec20.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2020-12-01") & Date_Reported <= as.Date("2020-12-31")))

dec20.newcases <- sum(dec20.covid.df$New_Cases)
dec20.deaths <- sum(dec20.covid.df$New_Death)

# JANUARY 2021
jan21.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2021-01-01") & Date_Reported <= as.Date("2021-01-31")))

jan21.newcases <- sum(jan21.covid.df$New_Cases)
jan21.deaths <- sum(jan21.covid.df$New_Death)

# FEBRUARY 2021
feb21.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2021-02-01") & Date_Reported <= as.Date("2021-02-28")))

feb21.newcases <- sum(feb21.covid.df$New_Cases)
feb21.deaths <- sum(feb21.covid.df$New_Death)

# MARCH 2021
mar21.covid.df <- (covh.cleaned %>%
  select(Date_Reported, State, Total_Cases, New_Cases, Total_Death, New_Death) %>%
  filter(Date_Reported >= as.Date("2021-03-01") & Date_Reported <= as.Date("2021-03-31")))

mar21.newcases <- sum(mar21.covid.df$New_Cases)
mar21.deaths <- sum(mar21.covid.df$New_Death)
Creating COVID-19 health dataframes (monthly)
# Creating a variable for months  
months <- c("JAN20", "FEB20", "MAR20", "APR20", "MAY20", "JUN20", "JUL20", "AUG20", "SEPT20", "OCT20", "NOV20", "DEC20", "JAN21", "FEB21", "MAR21")

# Creating a variable for new cases 
newcases <- c(jan20.newcases, feb20.newcases, mar20.newcases, apr20.newcases, may20.newcases, jun20.newcases, jul20.newcases, aug20.newcases, sep20.newcases, oct20.newcases, nov20.newcases, dec20.newcases, jan21.newcases, feb21.newcases, mar21.newcases)

# Creating a variable for deaths 
deaths <- c(jan20.deaths, feb20.deaths, mar20.deaths, apr20.deaths, may20.deaths, jun20.deaths, jul20.deaths, aug20.deaths, sep20.deaths, oct20.deaths, nov20.deaths, dec20.deaths, jan21.deaths, feb21.deaths, mar21.deaths)

# Time series 
dates <- c("01/01/2020", "02/01/2020", "03/01/2020","04/01/2020", "05/01/2020","06/01/2020", "07/01/2020", "08/01/2020", "09/01/2020", "10/01/2020","11/01/2020", "12/01/2020", "01/01/2021", "02/01/2021", "03/01/2021")

# COVID-19 health per month dataframe
monthly.covid.df <- data.frame(months, dates, newcases, deaths)

monthly.covid.df <- monthly.covid.df %>%
  mutate(dates = as.Date(dates, format = "%m/%d/%y"))

# Result
# grid.draw(tableGrob(monthly.covid.df, theme=ttheme_default(base_size = 6) ))
head(monthly.covid.df)
##   months      dates newcases deaths
## 1  JAN20 2020-01-01        7      0
## 2  FEB20 2020-02-01       32      0
## 3  MAR20 2020-03-01   188138   3680
## 4  APR20 2020-04-01   875947  55007
## 5  MAY20 2020-05-01   725178  42053
## 6  JUN20 2020-06-01   847821  21634
Visualize COVID-19 New Cases Timeline
# Find the top 5 highest months
head(arrange(monthly.covid.df,desc(newcases)), n = 5)

# Color labels 
top5.cases <- c("#69b3a2","#69b3a2","#69b3a2","#69b3a2","#69b3a2","#69b3a2",
          "#B20000","#69b3a2","#69b3a2","#B20000","#B20000","#B20000","#B20000","#B20000","#69b3a2")

# Case labels 
num.cases.mils <- c("7", "32", "188K", "875K", "725K","847K", "1.9m", "1.4m", "1.2m", "1.9m", "4.4m", "6.3m", "6m", "2.3m", "1.7m")

# Visualize 
b.cases <- barplot(newcases, yaxp=c(0, max(newcases), 15), 
        ylim=range(pretty(c(0, newcases))), names=months, xlab = "Time Period", ylab = "Number of New Recorded Cases", main = "Number of Newly Recorded COVID-19 Cases per Month in the U.S. 20-21", col = top5.cases)  

y<-as.matrix(newcases)
text(b.cases, y+300000,labels=num.cases.mils)
Visualize COVID-19 Deaths Timeline
# Find the top 5 highest months
head(arrange(monthly.covid.df,desc(deaths)), n = 5)

# Color labels 
deaths.top5 <- c("#90bcff","#90bcff","#90bcff","#B20000","#B20000","#90bcff",
          "#90bcff","#90bcff","#90bcff","#90bcff","#90bcff","#B20000","#B20000","#B20000","#90bcff")

# Case labels 
num.deaths.mils <- c(0, 0, '3,680', '55K', '42K,', '21K', '28K', '29K', '22K', '24K', '39K', '79K', '97K', '63K', '33K')

# Visualize 
b.deaths <- barplot(deaths, yaxp=c(0, max(deaths), 15), 
        ylim=range(pretty(c(0, deaths))), names=months, xlab = "Time Period", ylab = "Number of New Recorded Deaths", main = "Number of Newly Recorded COVID-19 Deaths per Month in the U.S. 20-21", col = deaths.top5) 

y<-as.matrix(deaths)
text(b.deaths ,y+2000, labels=num.deaths.mils)

 

Part 3: Unemployment Data Analysis


Unemployment Analysis and Visualizations

Data

The table below summarizes the unemployment rates and monthly change in unemployment per month from Jan ’20 to Mar ’21. New cases and deaths per month have been calcuated as a proportion of the total cases and deaths from Jan ’20 to Mar ’21 in order to create a line chart with all three variables.

##      Date Unemployment_Rate Monthly_Change   percentcases percentdeaths
## 1   JAN20               3.5           -2.8  0.00002317851     0.0000000
## 2   FEB20               3.5            0.0  0.00010595892     0.0000000
## 3   MAR20               4.4           25.7  0.62296558878     0.6836349
## 4   APR20              14.8          236.4  2.90044987507    10.2186703
## 5   MAY20              13.3          -10.1  2.40122112355     7.8122010
## 6   JUN20              11.1          -16.5  2.80731860893     4.0189560
## 7   JUL20              10.2           -8.1  6.37214414227     5.2478270
## 8   AUG20               8.4          -17.6  4.86821293398     5.4215222
## 9  SEPT20               7.8           -7.1  4.03100172501     4.1476949
## 10  OCT20               6.9          -11.5  6.38421021402     4.4607179
## 11  NOV20               6.7           -2.9 14.57049022953     7.3187949
## 12  DEC20               6.7            0.0 21.17789628105    14.7109692
## 13  JAN21               6.3           -6.0 20.19442534241    18.0373733
## 14  FEB21               6.2           -1.6  7.79635782115    11.7835998
## 15  MAR21               6.0           -3.2  5.87317697682     6.1380385

Now let’s take a look at correlations between unemployment rates, cases, and deaths:

Visualize Unemployment Rates’ Timeline

SUMMARY

Visualize Monthly Change Timeline

SUMMARY

Code

Wrangle data for visualizations
# Create a new unemployment dataframe 
ur.viz <- ur.cleaned.df

# Renaming columns 
colnames(ur.viz) <- c("Date", "Unemployment_Rate", "Monthly_Change")

# Cleaning and wrangling time/date data   
ur.viz$Date <- months

ur.viz$Date <- factor(ur.viz$Date, levels = c("JAN20", "FEB20", "MAR20", "APR20", "MAY20", "JUN20", "JUL20", "AUG20", "SEPT20", "OCT20", "NOV20", "DEC20", "JAN21", "FEB21", "MAR21"))

# Add COVID-19 cases data, calculate contribution percentage from each month to  total cases 
options(scipen = 999)
vector = c()
for (i in 1:length(newcases)) {
  vector <- c(vector, (newcases[i]/sum(newcases)*100))
  formatC(vector[i], digits = 1)
  } 
ur.viz$percentcases <- vector

# Add COVID-19 deaths data, calculate contribution percentage from each month to total deaths 
vector.d = c()
for (i in 1:length(deaths)) {
  vector.d <- c(vector.d, (deaths[i]/sum(deaths)*100))
  formatC(vector.d[i], digits = 1)
  } 
ur.viz$percentdeaths <- vector.d

# Result 
head(ur.viz)
##    Date Unemployment_Rate Monthly_Change  percentcases percentdeaths
## 1 JAN20               3.5           -2.8 0.00002317851     0.0000000
## 2 FEB20               3.5            0.0 0.00010595892     0.0000000
## 3 MAR20               4.4           25.7 0.62296558878     0.6836349
## 4 APR20              14.8          236.4 2.90044987507    10.2186703
## 5 MAY20              13.3          -10.1 2.40122112355     7.8122010
## 6 JUN20              11.1          -16.5 2.80731860893     4.0189560
Visualize Unemployment Rates’ Timeline
# Creating the graph
ggplot(ur.viz, aes(x=Date, group = 1)) + geom_line(aes(y = Unemployment_Rate, group = 1), color = "darkred") + geom_line(aes(y = percentcases, group = 1), color="orange", linetype="twodash") + geom_line(aes(y = percentdeaths, group = 1), color="darkgreen", linetype="twodash") + labs(x = "Time Period", y = "Percentage", caption="Source: Center for Disease Control (CDC) and U.S. Bureau of Economic Analysis (BEA)") + ggtitle("Unemployment Rate Trends in the U.S. 2020-21")  
Visualize Monthly Change Timeline
ggplot(ur.viz, aes(x=Date, group = 1)) + geom_line(aes(y = Monthly_Change, group = 1), color = "darkred") + geom_line(aes(y = percentcases, group = 1), color="orange", linetype="twodash") + geom_line(aes(y = percentdeaths, group = 1), color="darkgreen", linetype="twodash") + labs(x = "Time Period", y = "Change Percentage", caption="Source: Center for Disease Control (CDC) and U.S. Bureau of Economic Analysis (BEA)") + ggtitle("Monthly Change in Unemployment Rates in the U.S. 2020-21")

 

Part 4: GDP Data Analysis


GDP Analysis and Visualizations

SUMMARY

SUMMARY

SUMMARY

Code

Load in libraries
# Load in necessary libraries
library(leaflet)
library(maptools)
library(rgeos)
library(rgdal)
Load in datasets
# This dataset shows the percent change in GDP for each state in the United States from 2017 to 2020.
gdp.state.df <- read.csv("C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets/GDP-State.csv")
state.df <- read.csv("C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets/GDP-State.csv")

# This dataset shows the percent change in GDP for each industry in each state in the United States from 2019 to 2020.
gdp.industry.df <- read.csv("C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets/GDP-Industry.csv")

# This is the shapefile for each state 
states <- readOGR(dsn = "C:/Users/Rinija/Documents/GitHub/economic-impact-of-covid19/Datasets", layer = "cb_2018_us_state_500k")
## Warning in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv
## = use_iconv, : ogrInfo: C:\Users\Rinija\Documents\GitHub\economic-impact-of-
## covid19\Datasets/cb_2018_us_state_500k.dbf not found
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Rinija\Documents\GitHub\economic-impact-of-covid19\Datasets", layer: "cb_2018_us_state_500k"
## with 56 features
## It has 0 fields
Cleaning and organizing GDP-State dataframe
# Renaming columns 
colnames(gdp.state.df) <- c("State", "Y2017", "Y2018", "Y2019", "Y2020", "Rank-Y2020")

# Removing spaces from all rows and pivoting the data to be in long format
gdp.state.df <- gdp.state.df %>% 
  mutate(across(where(is.character), str_remove_all, pattern = fixed(" "))) %>% 
  pivot_longer(cols = Y2017:Y2020, names_to = "Year", values_to = "Change")
Creating dataframes for each state
# California Subset 
ca.gdp <- gdp.state.df[(gdp.state.df$State == "California"), ]

# Texas Subset 
tx.gdp <- gdp.state.df[(gdp.state.df$State == "Texas"), ]

# Rhode Island Subset
ri.gdp <- gdp.state.df[(gdp.state.df$State == "RhodeIsland"), ]

# Utah Subset
ut.gdp <- gdp.state.df[(gdp.state.df$State == "Utah"), ]
Visualization of GDP Percent Changes from 2017-2020
CA <- ggplot(ca.gdp %>%  mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
  geom_bar(stat="identity")+
  scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
  ggtitle("Percent Change in GDP from 2017-2020 in California")+
  geom_hline(yintercept = 0)

TX <- ggplot(tx.gdp %>%  mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
  geom_bar(stat="identity")+
  scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
  ggtitle("Percent Change in GDP from 2017-2020 in Texas")+
  geom_hline(yintercept = 0)

RI <- ggplot(ri.gdp %>%  mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
  geom_bar(stat="identity")+
  scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
  ggtitle("Percent Change in GDP from 2017-2020 in Rhode Island")+
  geom_hline(yintercept = 0)

UT <- ggplot(ut.gdp %>%  mutate(fill_value = ifelse(Change > 0, "Increase in GDP", "Decrease in GDP")), aes(x = Year, y = Change, fill = fill_value))+
  geom_bar(stat="identity")+
  scale_fill_manual(values=c("Increase in GDP" = "darkolivegreen3","Decrease in GDP"= "brown1"))+
  ggtitle("Percent Change in GDP from 2017-2020 in Utah")+
  geom_hline(yintercept = 0)
Visualize United States Change in GDP for 2020
label_2020 <- paste0(
  "<b>2020:</b> ", state.df$X2020, "<br>"
)
paletteBins <- c(-8, -6, -4, -2, 0)
colorPalette <- colorBin(palette = "RdYlGn", domain = state.df$X2020, na.color = "transparent", bins = paletteBins)

leaflet(states) %>% 
  addTiles() %>% 
  setView(lat = 39.8097, lng = -98.5556, zoom=4) %>%
  addPolygons(
    stroke = TRUE, 
    fillColor = ~colorPalette(state.df$X2020),
    color = 'Black', 
    weight = 1.5, 
    label = ~lapply(label_2020, htmltools::HTML)) %>% 
  addLegend(pal = colorPalette, values = state.df$X2020, opacity = 0.9, title = "GDP Percent Change in 2020", position="bottomleft")
Visualize United States Change in GDP for 2019
label_2019 <- paste0(
  "<b>2019:</b> ", state.df$X2019, "<br>"
)
paletteBins <- c(-1, 0, 1, 2, 3, 4, 5, 6)
colorPalette <- colorBin(palette = "Greens", domain = state.df$X2019, na.color = "transparent", bins = paletteBins)

leaflet(states) %>% 
  addTiles() %>% 
  setView(lat = 39.8097, lng = -98.5556, zoom=4) %>%
  addPolygons(
    stroke = TRUE, 
    fillColor = ~colorPalette(state.df$X2019),
    color = 'Black', 
    weight = 1.5, 
    label = ~lapply(label_2019, htmltools::HTML)) %>% 
  addLegend(pal = colorPalette, values = state.df$X2019, opacity = 0.9, title = "GDP Percent Change in 2019", position="bottomleft")

 

Part 5: Summary of Findings


 

Part 6: Resources and Tutorials


United States COVID-19 Cases and Deaths by State over Time Dataset from the U.S. Center for Disease Control and Prevention.

Labor Force Statistics from the Current Population Survey Dataset from the U.S. Bureau of Labor Statistics.

Gross Domestic Product by State, 4th Quarter 2020 and Annual 2020 (Preliminary) from U.S. Bureau of Economic Analysis.

Interactive Choropleth Maps from R Journalism.

R and Leaflet to create interactive choropleth maps from Towardsdatascience.

Filter with Date data from Exploratory.io.

R Markdown: The Definitive Guide from Bookdown.

Lesson 2. Twitter Data in R Using Rtweet: Analyze and Download Twitter Data from EarthLab.

How to Generate Word Clouds in R from Towardsdatascience.

 

Part 7: Further Information


We have created a GitHub Repository with our entire codebase and data sets for future reference.

Please note: a few of our graphs may get cut off once we knit the Rmd file due to spacing limits, however, we have saved PNG images of our graphs and added them to the ‘Graph-Images’ folder in our GitHub Repository so you can see the entire image!